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Abstract: In this article we consider the estimation of the joint distribution of the random coefficients and 
error term in the nonparametric random coefficients binary choice model. In this model from economics, each 
agent has to choose between two mutually exclusive alternatives based on the observation of attributes of the 
two alternatives and of the agents, the random coefficients account for unobserved heterogeneity of preferences. 
Because of the scale invariance of the model, we want to estimate the density of a random vector of Euclidean 
norm 1. If the regressors and coefficients are independent, the choice probability conditional on a vector of d — 1 
regressors is an integral of the joint density on half a hyper-sphere determined by the regressors. Estimation 
of the joint density is an ill-posed inverse problem where the operator that has to be inverted in the so-called 
hemispherical transform. We derive lower bounds on the minimax risk under 'LP losses and smoothness expressed 
in terms of Besov spaces on the sphere S'^"^. We then consider a needlet thresholded estimator with data-driven 
thresholds and obtain adaptivity for U' losses and Besov ellipsoids under assumptions on the random design. 

Key-words: Discrete choice models, random coefficients, inverse problems, minimax rate optimality, adapta- 
tion, needlets, data-driven thresholding. 
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Estimation adaptative dans le modele non parametrique de choix binaire a 
coefficients aleatoires par seuillage de needlets 



Resume : Dans cet article, nous considerons I'estimation de la distribution jointe des coefficients aleatoires 
et de du terme d'erreur dans le modele non parametrique du choix binaire a coefficients aleatoires. Dans ce 
modele issu de I'economie, chaque agent doit choisir entre deux possibilites mutuellement exclusives a partir 
de I'observation d'attributs de ces deux alternatives et des agents, les coefficients aleatoires permettent de 
prendre en compte de I'heterogeneite non observee des preferences. Du fait de I'invariance d'echelle du modele, 
nous cherchons a estimer la densite d'un vecteur aleatoire de norme euclidienne 1. Si les regresseurs et les 
coefficients sont supposes independants, la probabilite de choix conditionne a un vecteur de d — 1 regresseurs 
est une integrate de la densite jointe sur une demi-sphere determine par ces regresseurs. L'estimation de cette 
densite jointe est un probleme inverse mal posee dont I'operateur a inverser est connu sous le nom transformee 
hemispherique. Nous obtenons des bornes inferieurs sur le rique minimax pour les pertes et des regularite 
mesures par des ellipsoides de Besov sur la sphere S*^"^. Nous proposons ensuite un estimateur par seuillage 
en needlet avec des seuils adaptatifs et obtenons son caractere adaptatif sur ces ellipsoides de Besov pour les 
pertes sous des hypotheses faibles sur le design aleatoire. 

Mots-cles : Modeles de choix discret, coefficients aleatories, probleme inverse, optimalite minimax, adapta- 
tion, needltes, seuillage adaptatif 
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In this article we consider the estimation of the joint distribution of the random coefficients and error term 
in the nonparametric random coefficients binary choice model. In this model from economics, each agent has to 
choose between two mutually exclusive alternatives based on the observation of attributes of the two alternatives 
and of the agents, the random coefficients account for unobserved heterogeneity of preferences. Because of the 
scale invariance of the model, we want to estimate the density of a random vector of Euclidean norm 1. If the 
regressors and coefficients are independent, the choice probability conditional on a vector of d — 1 regressors 
is an integral of the joint density on half a hyper-sphere determined by the regressors. Estimation of the joint 
density is an ill-posed inverse problem where the operator that has to be inverted in the so-called hemispherical 
transform. We derive lower bounds on the minimax risk under U' losses and smoothness expressed in terms of 
Besov spaces on the sphere S'^"^. We then consider a needlet thresholded estimator with data-driven thresholds 
and obtain adaptivity for losses and Besov ellipsoids under assumptions on the random design. 
Key Words: Discrete choice models, random coefficients, inverse problems, minimax rate optimality, adaptation, 
needlets, data-driven thresholding. 
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1 Introduction 

Discrete choice models are important models in economics for the choice of agents between a number of ex- 
haustive and mutually exclusive alternatives. They have applications in many areas ranging from empirical 
industrial organizations, labor economics, health economics, planning of public transportation, evaluation of 
public policies, etc. For a review, the interested reader can refer to the Nobel lectures of D. Mc Fadden [24]. We 
consider here a binary choice model where individuals only have two options. In a random utility framework, 
an agent chooses the alternative that yields the higher utility. Assume that the utility for each alternative 
is linear in regressors which are observed by the statistician. The regressors are typically attributes of the 
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alternative faced by the individuals, e.g. the cost or time to commute from home to one's office for each of 
the two transport alternatives. Because this linear structure is an ideal situation and because the statistician 
is missing some factors, the utilities are written as the linear combination of the regressors plus some random 
error term. When the utility difference is positive the agent chooses the first alternative, otherwise he chooses 
the second. The Logit, Probit or Mixed-Logit models are particular models of this type. We consider the case 
where the coefficients of the regressors are random. This accounts for heterogeneity or taste variation: each 
individual is allowed to have his own set of coefficients (the preferences or tastes). Like in [8, 13], we consider 
a nonparametric treatment of the joint distribution of the error term and vector of random coefficients. 

Nonparametric treatment of unobserved heterogeneity is very important in economics, references include 
[2, 4, 7, 8, 11, 12, 13]. It allows to be extremely flexible about the joint distribution of the preferences (as 
well as the error term). [7] considers treatment effects models with random coefficients in the case where 
the allocation to treatment corresponds to a decision mechanism formulated in the form of model (1) below. 
Random coefficients models can be viewed as mixture models. They also have a Bayesian interpretation, see 
for example [10] for a model similar to (1) on the sphere. Nonparametric estimation of the density of the vector 
of random coefficients corresponds to nonparametric estimation of a prior in the empirical Bayes setting. 

In the nonparametric random coefficients binary choice model we assume that we have n i.i.d. observations 
{xi,yi) of {X,Y) where X is a random vector of Euclidean norm 1 in and Y is a discrete random variable 
and Y and X are related through a non observed random vector /? of norm 1 by 



In (1), (•,*) is the scalar product in M . We make the assumption that X and /3 are independent. This 
assumption corresponds to the exogeneity of the regressors. It could be relaxed using instrumental variables 
(see [8]). —1 and 1 are labels for the two choices. They correspond to the sign of {X,f3). X and /3 are assumed 
to be of norm 1 because only the sign of {X, (3) matters in the choice mechanism. The regressors in the latent 
variable model are thus assumed to be properly rescaled. Model (1) allows for arbitrary dependence between 
the random unobservables. In this model, X corresponds to a vector of regressors where, in an original scale, the 
first component is 1 and the remaining components are the regressors in the binary choice model. The 1 stands 
because in applications we always include a constant in the latent variable model for the binary choice model. 
The first element of /3 in this formulation absorbs the usual error term as well as the constant in standard binary 
choice models with non-random coefficients. We assume that X and (5 have densities fx and with respect 
to the spherical measure a on the unit sphere S*^"-"^ of the Euclidean space W^. Because in the original scale 
the first component of X is 1, the support of X is included in if"*" = {x G S'^"^ : < x, (1, 0, . . . , 0) >> 0}. We 
assume, for simplicity, through out this paper, that the support of X satisfies supp fx = . In [8], the case 
of regressors with limited support, including dummy variables is also studied but identification requires that 
these variables, as well as one continuously distributed regressor, are not multiplied by random coefficients. 

The estimation of the density of the random coefficient can be viewed as a linear ill-posed inverse problem. 
We can write for x S , 



^ = 21(x,/3)>o - 1 



1 



if X and /3 are in the same hemisphere 
1 otherwise. 





(2) 
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where sign denotes the sign. As recahed in [27], if (/9 is homogeneous of degree —d , i.e. there exists a function 
/ on S"^"^ such that <f{x) = where | • | is the euchdean norm, then 

-Im / v,(x)e'<^'^>dy = / sign ((^ b)) f{b)da{b). (3) 
vr jRd JbeS"^-! \\\x\ I ) 

We can rewrite this in terms of another operator from integral geometry: 

P(y = i\x = x) = ny\X = x] + l ^ r 1 (5)^^(5) A ^ ) (^). (4) 

The operator T-L is caUed the hemispherical transform. H is a special case of the Pompeiu operator (see, e.g., 
[-30]). The operator Ti arises when one wants to reconstruct a star-shaped body from its half-volumes (see 
[5]). Inversion of this operator was studied in [5, 27], it can be achieved in the spherical harmonic basis (also 
called the Fourier Laplace basis as the extension of the Fourier basis on and the Laplace basis in §^), using 
polynomials in the Laplace-Beltrami operator for certain dimensions and using a continuous wavelet transform. 
[27], and in a certain extent [9], also discuss some of its properties. It is an operator which is diagonal in the 
spherical harmonic basis and which eigenvalues are known explicitly. The estimation problem is a deconvolution 
problem on the sphere where the left hand side is not a density but a regression function with random design. 
Deconvolution on the sphere has been studied by various authors among which [10, 16, 21]. Because of the 
indicator function, this is a type of boxcar deconvolution. Boxcar deconvolution has been studied in specific 
cases in [14, 20]. There are two important difficulties regarding identification: (1) because of the intercept in 
the latent variable model, the left hand side of (4) is not a function defined on the whole sphere, (2) T-L is not 
injective (this can easily be seen from (3) where (p cannot be identified from only the imaginary part of its Fourier 
transform, even less when X has limited support). Proper restrictions are imposed to identify Treatment of 
the random design (possibly inhomogeneous) with unknown distribution appearing in the regression function 
that has to be inverted is an important difficulty. Regression with random design is a difficult problem, see for 
example [18, 23] for the case of wavelet thresholding estimation using warped wavelet for a regression model on 
an interval, or [G] in the case of inhomogeneous designs. [8] propose an estimator using smoothed projections on 
the finite dimensional spaces spanned by the first vectors of the spherical harmonics basis. It is straightforward 
to compute in every dimension d (the specific tools are recalled in Section 2.1). Convergence rates for the 
L^'-losses for p S [l,oo] and CLT are obtained in [8]. They depend on the degree of smoothing of the operator 
which is v = d/2 in the Sobolev spaces based on L^, the smoothness of the unknown function, the smoothness 
of fx as well as its degeneracy (when it takes small values or is 0, in particular when x is approaching the 
boundary of H^). The treatment of the random design is a major difficulty that we deal with in this paper. 

The goal of this paper is to provide an estimator of which is adaptive in the unknown smoothness of 
the function. Needlets are localized frames built on the spherical harmonic basis, they were introduced in [26]. 
They were successfully used in statistics to provide adaptive estimation procedures in [1, 17, 19]. As they are 
built on the spherical harmonic basis, they are very well suited for deconvolution on the sphere, this was used 
in [21]. Unlike these articles, and in the spirit of [3], we propose a method with a more accurate data-driven 
thresholding method. 
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2 Preliminaries 

We use the notation x Ay and x V y for respectively the minimum and the maximum between x and y. We 
write X < y when there exists c such that x < cy and x > y when there exists c such that x > cy. We also write 
X ~ y when x <y and x > y. 



2.1 Harmonic analysis on the sphere 

We denote by LP(S'^~^) the space of real valued p integrable functions with respect to the spherical measure a, 
we denote the L^-norm by || • ||p. L^(S'^~"^) is a Hilbert space with the classical scalar product. 
Every function in L^(S'^) can be decomposed in the following way: 

/ = /+ + /- 

where 

/+(6) = (/(6)+/(-6))/2 



and 



r(6) = (/(6)-/(-6))/2 

f~^ (resp. /~) is the even (resp. odd) part of the function / (taking limits of functions which are well defined 
pointwise). We can write the orthogonal sum 

It can be further decomposed as the orthogonal sum 

L2(S'i-i) = 0i/M 
km 

where H^''^ are the eigenspaces of the Laplace-Beltrami operator on the sphere, corresponding to the eigenvalues 
Ck,d — k{k + d — 2). The spaces H'^''^ are of dimension 

^ (2k + d-2){k + d-2)\ 
^^i'^^)- k\{d-2)\{k + d-2) ■ 

Each such finite dimensional space is generated by an orthonormal basis of spherical harmonics of degree k that 
we denote by {hk,i)fl'l''^^ ■ L'^^^^^{§''^) (resp. L2gyg„(§'^)), is the orthogonal sum of the H''''^ for k odd (resp. even). 
The space H^''^ of spherical harmonics of degree is the one dimensional space spanned by 1. The projector 
-Lfc onto H^''^ is a kernel operator with kernel 

L{k,d) 

Lk4{x,y)= hk,i{x)hk,i{y) (5) 
1=1 
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having the simple expression 

Lk,d{^,y) =^Lk,d{< x,y >), ^Lk^it) = "''"'"^'^/ffrf^ (Addition Formula) (6) 



md)C^^'\t) 



where are the Gegenbauer polynomials and ^{d) = {d — 2)/2. The Gegenbauer polynomials are defined for 
/i > —1/2 and are orthogonal with respect to the weight function (1 — t^)'^~^/'^dt on [—1,1]. 0^(1) = 1 and 
Cf (t) = 2fit for / while C?(i) = 2t. They satisfy the recursion relation 

{k + 2)ci:^,{t) = 2(/z + k + i)tci:_^,{t) - (2/x + k)cj:{t). (?) 

It is classical (and follows easily from (5)) that the squared L^-norm with respect to either one of the argument 
of the kernel is a constant: 

L{k d) 

yxeS'-\ \\LkAx,-)\\l= E \hkA^)\' = ^iMr- (8) 

1=1 1^ I 

Recall that IS'*"^! = /T{d/2). The condensed harmonic expansion of a function / in L^(S'^) is the expansion 

/ = Sfclo ^k,df- 

In [8], smoothed projection operators are used, they have good approximation properties in all L^(S'^~^) 
spaces and are uniformly bounded from L^' to (the L^— norm of the kernel is uniformly bounded). They 
are obtained using a proper damping of the high frequencies. One such operator is the delayed means ([8] also 
considers the Riesz means). It is obtained via a C°° and decreasing function a on R"*" supported on [0,2], such 
that Vt G [0,2], < a{t) < 1 and \/t S [0, 1], a{t) = 1. The delayed means are defined through the kernels 

^ / h \ 

K-'\x,y)^Y.^\^)LU^^v)- (9) 
fc=o ^ 

These kernels have nearly exponential localization properties (see Theorem 2.2 in [2(3]). They are building 
blocks for the construction of needlets in [26]. 

2.2 Needlets and Besov spaces 

Define b such that 

Vt G IR+, b^{t) =a{t)- a{2t). 
It is nonzero only when 1/2 < t < 2 and satisfies Vt E [1/2, 1], b'^{t) + b'^{2t) = 1 and thus 



oo 



We assume as well that for some positive c, h{t) > c if i E [3/5, 5/3]. The needlets are the functions 

CxD / h \ 

V',,c(x) ^ i)Y,b[—^] Lfc,,(e, x) if j G N, ^ E (10) 



fc=0 
RR n° 7647 
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V'o,?!^;) = -^^o,d(^,a;), (11) 

where for all j G N, G and ((^(i, O^j^gH ^^'^ respectively the nodes and positive weights of a quadrature 

formula on the sphere that integrates exactly ah functions in 0|l^o H^''^-, ^.nd satisfy, for some positive C=, 
Vj G N, ^2J'('='-i) < \Ej\ < Cs2^'('^-i), Vj e N, G Ej, ^2"^^'^-^^'^ < uj{j,^) < C^2-'J^'^-^^'^ where \Ej\ 
denotes the cardinal of the set Hj. The quadrature formula is given in Corollary 2.9 of [2(i]. Note that for 
j = 0, TpQ^^{x) is constant and one takes Hq as a singleton. Note that the Addition Formula is a very useful 
tool because the needlets, unUke the spherical harmonics, have a simple expression in every dimension. The 
L^-norms of the needlets satisfy, for constants Cp and Cp uniform in j and ^, 

^^^j{d^i){i/2^i/p) < 11^. ^^11^ < ^^2^-{'^-i){i/2-iM, (12) 
this is a consequence of the following localization property around the nodes of the quadrature formula 

Vr? G G < CI (13) 

(1 + 21 arccos((^,r/))) 



If / G LP(§'^-1) for p G [1, cx)], then 



in L^'(S'^ ^). The needlets form a tight frame: 



In the sequel, we denote by || • \\iv the ^^-norm of a vector. The following lemma from [1] is useful in the analysis. 



Lemma(4 For every p G (0,oo], there exists a positive constant C' such that 



E 



< Qlt23{d~l){l/2-l/p) 



(14) 



{ii) There exist a constant ca and subsets Aj C Hj with \Aj\ > ca'^-'^'^ such that for every p G (0, co], there exists 
a positive constant c'j ^ such that 



E l^H'^iA 



> c^'^2-''('^"i)(^/2"^/P) 



IP 



(15) 
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(m) For every p G [l,oo], there exists a positive constant C'^ such that 



i/p 



(16) 



[26] discuss three formulations of the Besov spaces B^ ^ on the sphere. The Besov spaces will be our scales of 
smoothness for the adaptive estimation. One characterization is in terms of the approximation error. If s > 0, 
p G [1, oo] and (7 G (0, oo], / belongs to if and only if / is in LP(S'^^^) and 



+ 



< 00 



£1 



where 



Whatever the function a in the definition of the smoothed projection operators, the above norm is equivalent 
to the following sequence space norm 



BS. 



2j(s+{d-l){l/2-l/p)) 



{ii) 



m 



We denote by Bp q{M) the ball of radius M for the above norm in Bp ^. From the proof of the continuous 
embeddings in [1] we can get easily: 

Lemma(^^ If p<r< 00, B^.g{M) C Bf^g{Ci^^~'^^''M) 

Ifs> {d-l){l/r-l/p) andr<p< 00, B^g{M) C Sp~^'^~^^^^/'^"^/^^(M). 

If f € B^^^{M) and (^{(3j^^)^^^ ^ ^ are its needlet coefficients, then 



^17) 



where Vj G N, Dj > 0, (-Dj)iGN e and \\{Dj)j(z]^\\iq < M. 



Note that < M implies that Vj G N, Dj < M. Recall as well that, when / belongs to Bj^ with 

s > {d — l)/p, then / is continuous and bounded. 

2.3 The hemispherical transform 

The hemispherical transform is a mapping from L^(S'^~-'^) to L^(S''~^) which maps a function / to a function 
which, evaluated at x G S"^"^, is the integral of the original function on the hemisphere {y G S'' : {x,y) > 0}. 
It is a special case of the Pompeiu operator and is strongly related to the spherical Radon transform. Several 
inversion formulas as well as properties of this mapping are given in [27]. These inversion formulas include 
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polynomials in the spherical Laplacian (for certain dimensions) and a continuous wavelet transform, the known 
inversion formula in the spherical harmonic basis is recalled. We make use of this latter because the needlet 
frame is very well suited to this decomposition. 

A consequence of the Funck-Hecke theorem (see, e.g., [9]), is that T-L \s a. diagonal operator in the spherical 
harmonic basis (^fc,i)z=i,...,L(fc,d), fceN with the same eigenvalue on the spaces H^''^. We thus only index them by 
the degree of the harmonics. 

Proposition 3 Ti is a self-adjoint operator on L^(S'^~^) with null space 



kern = ^H'P^''=\f€Ll,^{S'-'): [ 



f{x)da{x) = 



Its nonzero eigenvalues {Xk,d)keN (k indices the degree of the harmonics) are 

X 

Ai 



lsd-2| 



d-l ' 

(_l)P|S'i-2|l .3...(2p- 1) 



\/p G N, A2p+i,d 



{d-l){d + l)---{d + 2p-l)' 



Note that Vp G N \ {0}, X2p,d = 0. It is easy to check (see, e.g., [27]) that T-L is continuous from L^q^j(S'^ ^) 
to H'^/^j and that its inverse is continuous from H'^/^j to L^qj^j(S'^~-^), where H'^/^j is the restriction to odd 
functions of the Sobolev space H*^/^. H'* is defined, for arbitrary s, by 



R' = {fe l2(s'^-1) : {-Ay/'f ^ J2 CkjLk,df G l2(s'^-i; 

fceN 



equipped with the norm 



The inverse of a function R in H'^^^^j is 



2,s = 11/112 + 



{-Ay/'f 



J , ^ L{k,d) 



^"'(^) = E 1— ^m(^) = E 1^ E {R^hk,i)h,,i , (18) 

fc odd V k odd ■^''''^ 1=1 J 

we use a parenthesis to stress that the last equality is not practical if we work in arbitrary dimensions but can 
nevertheless be used in proofs. Let us also recall the following Bernstein type inequality from [8]. 

Proposition 4 

K 

Vd>2, VpG[l,cx)], 3B{d,p)>0: VP G //'''^ \\n~'^ P\\p < B{d,p)K'^/'^\\P\\p. (19) 

k=0 
k odd 
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Throughout the paper, we denote hy u = d/2 the degree of ill-posedness of the inverse problem. It is the same 
degree of ill-posedness as that of the Radon transform in which appears in tomography and in [ 1 2] for the 
estimation of the vector of random coefficients in the linear regression problem. 

2.4 Identification of 

Let us review the main arguments for the identification of that are taken from [s] . Imposing, as we do, that 
/? belongs to S*^"^ is not sufficient. First, the left hand side of (4) is only defined on the support of fx- Through 
out the article we make the following assumption. [8] present cases where it could be relaxed when we do not 
assume that all coefficients are random. 

Assumption 5 supp/x = H'^ and E[y|X = x] is well defined pointwise on supp/x- 
First note that, because is a density, 

n{fp) = n{f^) + l. (20) 

We can now introduce the function R such that 

j E[Y\X = x] whenxeH+ 
^ ' \ -E[Y\X = -x] when - x e H+ ' ^ ' 

It is the unique extension of the regression function which is compatible with (1) and (20). We can now write 

2/(sd-l\ n ^ TTd/2 /^d-U 



Thus, (1) implies implicitly, if fp belongs to L (S ), that i? G H q^j^C^ ) ^^'^ thus continuous on the whole 
sphere. Also, from properties of Section 2.3, there exists a unique fj^ in L^qj^(S'^^-'^) such that R = 2'H{f^). 
The function f^ can be retrieved via the inversion formula (18). We need yet another assumption to identify 
this is due to the non invertibility of Ti in the whole II'^/^(S'^~^) space. 

Assumption 6 is defined pointwise and has a support included in some hemisphere. 

Assumption 6 appears in both [8, 13]. In many applications this is a plausible assumption. It is the case for 
example if one coefficient has a sign or if some coefficients are non random. For example, if one regressor is 
the price difference, then the price coefficient is negative in the binary choice model. Indeed, when the price 
difference increases there is substitution from the good labeled 1 to good labeled -1 and the choice probability 
for good 1 decreases. 

Using Assumption 6, we can recover uniquely //? via 



//3 = 2/,"l 



Note that we do not need to know which hemisphere contains supp//^. Given an estimator of , we shall 
always use 2/^1-^ as an estimator of f^. The first stage of the proof of Proposition 4.2 in [8] tells us how to 

relate the loss in the estimation of with that of the estimation of . 
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2.5 Random design 

For the purpose of estimation, we also exploit the following relation which is valid for any g in 

(ii, (7) = {R,g^) (because R is odd) 
R{x)g-{x) 



2 

2Ex 
2E 



H+ fx{x) 

R{X)g-{X) 
fxiX) 

EY{Y\X)g-{X) 
fxiX) 

Yg-{X) 
fx{X) 



fx{x)da{x) 



The expectation could be approximated by f J2i=i ^ where fx is an estimator of the unknown fx, 

fx{Xi) 

possibly trimmed to avoid the division by quantities close to zero. 

Like in [8] , we rely on a plug-in estimator of fx ■ Many such estimators exists and we would like to mention 
one particular estimator which is the needlet thresholding estimator of the density of [1]. 



3 Lower Bound 

The following theorem gives lower bounds on the minimax risk. 
Theorem 7 Assume that fx G L°° 

When p > 1, z > 1, q > 1 (with the restriction q < r is s = p (^v + (r ~ p)^ s > p(^u + (r ~ p 
(the parameters are in the dense zone), 



inf sup E fp - fp 



> 



'^ll/x||L°°{/^^ 



f22) 



When p>l, z>l, q>l and < s < p (^v + (r ~ p) (^^^ parameters are in the sparse zone), 



inf sup E 

Tf3 U&B-rJM) 



> 



log (n\\fx\\Loo(H+)^ 
\ n'\\fx\\L°°{H+) 



( s-(d-l)(l/r-l/p) \. 
U + y-(d-l)(l/r-l/2) )^ 



(23) 
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The proof of this result is given in Section 4. As discussed in Section 4.3 of [n], the classical assumption that 
fx is bounded from below is very restrictive for the model at hand. In the d = 2 case, it would imply that, in 
the original scale, X has tails larger than the Cauchy tails. It is therefore important for applications to allow 
for densities which are unbounded from below. We make the dependence on ||/x||l°°(h+) explicit. However 
this does not give that the estimation problem is more difficult when fx can take values arbitrary close to 0, 
it does not even take into account that fx is a density as the larger ||/x||L°°(_ff+) t^^e greater the lower bound. 
We therefore expect these lower bounds to properly characterize the difficulty of the estimation problem when 
fx is bounded from below but to be too optimistic otherwise. 

[6] introduces, in the case of the estimation of the regression function and inhomogeneous designs, risks 
where the rate is a function and can vary with the points in the support of the density of the design. There are 
no extensions to inverse problems up to our knowledge. 

It will also appear in Section 4 that even if fx were known but unbounded from below, good rates require 
trimming of the density fx for design points where the density is low. Not knowing fx might degrade the 
optimal rates in one step procedures. This is discussed in Section 5. 



4 A needlet thresholded estimator when fx is known and bounded from 
below 

4.1 Smoothed projections and needlet estimators 

In this section we present an ideal benchmark estimator. We assume that the density of the design is known and 
bounded from below. In practice it is unknown and in most cases unbounded from below (see the discussion in 
Section 3). 

Using the identity of Section 2.5 with g[-) = Lk^d{-,x) for fixed x, we estimate Lk^dR{x) by 

where L^^{xi,x) = if is even and L^^[xi,x) = Lk^d{xi,x) if k is odd. The subscript / stands for the ideal 

estimator where the density of the random design is known. Because H^'^ is a vector space, L^^dR G H^'*^. 
A smoothed projection estimator with kernel (9) and smoothing window a (in the ideal case where fx is known) 
can be written as 



k odd 



We can also estimate using the needlet frame with the same smoothing window a. The needlet coefficients 
are equal to 
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k odd 

bi^ 

k odd ^'^'^''^ 

6 



k odd ^"^'^.'^ 



2J- 



2J-2<fc<2-' 



k odd 



= (/^ ^.'^.•^^ V'i.s) (collecting back the terms using that a = 1 for = 0, . . . , 2^) 

where ' ' is the expected value of (the spherical convolution Ka fp)- The needlet coefficients can 

be estimated by 



A; odd 



Moreover 



^|V',-5(x)=a;(j,0' (|e 4^^'^^^) (E^(^) We,^)), 



which belongs to 0|l=!o -f^*"'*^) thus, from the quadrature formula, 



E /^i.C^i.? = 9 E , 

ieEj ^ k odd '^^''^ 



and 



J 



/,a,J 



I3j'^ipj^^ = ^ ^ if 13 is o'i'i ^^"^ thus of integral on the sphere) 
j=o Ces^ j=i ^eSj 

= J E T^^' + i E r ^ £1!^' (forte [1/2, l],62(t) + 62(2t) = i) 

fc odd odd 



E 1— ^M^ +^ E ^f-^^M^ (62(t)=a(t)-a(2t)) 

fc=l ;j_2J-i^i 
fc odd odd 

- — -/,a, J— 1 
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The smoothed projection and needlet estimators coincide. They are biased and the bias corresponds to the 
approximation error. 

4.2 A data driven thresholding scheme 

The unbiased estimators of the needlet coefficients 



I, a r ■ /-\ * 

'hi 



n 



i=l 



fxiXi 



i=l 



k odd 



are used to define the needlet thresholded estimator 



fB 



where pT^ ^ ^ is a suitable level and local dependent thresholding function depending on some 7 > 1 . In the 
subsequent analysis, we consider the hard thresholding function 

The highest resolution level J that should be used to obtain a needlet estimator of section 4.1 that achieves 
the minimax rate of convergence depends on a prior knowledge of the smoothness of the unknown density of 
the random coefficient. Hard-thresholding is a nonlinear estimation method where we allow for a larger highest 
resolution level J, independent of the smoothness of the unknown function, but where thresholding allows to 
perform a bias/variance trade-off at the level of the coefficients in the high-dimensional space. As we will see 
this yields an adaptive procedure. We define the empirical variance estimator 



1 n i— 1 



and the data driven thresholds 



rp A_ rpl 



J — 28 , ,r 7 logn 



n 



where Mj ^ is some upper bound on the sup-norm over of {±l}xi7+ of Gj_^(x,y)-E 



/3^^^ (remark that M^^ can be chosen equal to 2||Gj^||oo) and we use the short hand notation 



'logn 



n 
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Using (12) and Proposition 4, we get the following upper bound which is uniform in ^ 



1 



f. 



X 



It depends on some prior knowledge of 

olves ^ allows to control the flucti 
The estimator of fp that we consider is 



L°°(H+) 



L°°(_H'+) 



< 2CooB{d,oo)2^ 



(u+{d-l)/2) 



1 



IX 



(24) 



. The higher order term in the definition of Tj^^^^ which 



involves Mj ^ allows to control the fluctuations of this estimated threshold. 



fl3 = 2fa l-^I,a,p 

where p is the above hard thresholding function with the data driven threshold. 



4.3 Two general inequalities 

We shall use below the constants ci^^ and C2,z defined by 



/ zT'-^e-^^dT < ci 



ZT e 



(25) 
(26) 



Theorem 8 For a//r>l,7>l,2>l, 



the two following inequalities hold: 
when p = 00, 



1 



IZ-l 



E 



~I,a,p 



+ (J + 1)^-1(7^^ a„,, 





z ■ 


< 




OO- 






,Z,J 



.I,a,J 



j=0 



sup 



3" J>T'''7 + 



+ 



j=0 

C7^4^i-i/r f I 

n'y 



1/2 



fx 



L°°(H+) 



2Jz{u+{d^l)/2) ] 2J(d-l)(l-l/r)> 
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where 



2+ (log (Ch2^('^-i)c2,. 



z/2 



bn,oci,z,J,T 



a/T log n 

(2V2C2B{d,2)y (2^- + (log (Ch2-^(^-i)c2,.. 



+ 

z/2 



7logn 



2 + (log (Ce2-^^'^^^^ci, 



1 _ 2-(^^+(rf-l)(^/2+l-l/T)) 

{8C^B{d, oo)/3)^ (21/- + (log (Ch2^('^-i)ci, 



while when p £ [1, 00 
1 



:>z-i 



E 



J 13 - J 13 



< 



1 _ 2-(^'^+(rf-l)(^+l-l/r)) 



J 3 ~ J 3 



, J^sp 2Kd-lMl/2~l/{p\/z)) \ga 



n 



^/2 



+ E 



rj,?l J. €.7 



22-l/r 



where 



n 



l/z 



7(1-7) 





1 




fx 



1/2 
L°°(/i'+) 



2J(u+{d-l)/2) \ 2Jid-l)(l-z/(pyz))^ 



n,p,z,J,T 



'"a,p,z 



,J = 1 + 2 



+ 



2ci 



l/z 



n 



7logn 



(24(ir)C2i?(d,2) 



(|c}(ir)Cooi?(d,00 



'2J(rf-i) 



z/2 



h°°{H+)j 



The inequalities of Theorem 8 provide some theoretical guaranty valid without any assumptions on the function 
//J. When J is well chosen depending on n and under some minimal regularity assumption on (see for 



instance Theorem 9), the only two meaningful terms are the approximation term /, 



-I, a, J 



fa 



and the term 



involving /3J ^ 



and 



. This second term can be interpreted in term of 



i.€.7 

oracle inequality, where the oracle estimates if and only if the error made by estimating this coefficient is 
smaller than the one made by discarding it. Indeed, such an oracle strategy would lead (when p < 00) to a 
quantity of the form 



/3 ■ - 



'hi 



'hi 
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Proving that such an oracle inequahty holds would require to lower bound [¥. N/3j;| - {31^ 

inequalities of Theorem 8 the ideal quantity (e [1^1 - PlX^ ^ is replaced by rj_^++, we call this term a 
quasi-oracle term. The remaining terms can be made as small as we wish by taking 7 large enough. Upper 
bounds of these types, uniform on Besov ellipsoids, yield an approximation error which can be expressed in 
terms of the regularity of the Besov class and is uniformly small for J large enough and allows to treat the 
bias/variance trade-off in the quasi-oracle term uniformly over the ellipsoid. 

Data driven thresholds are known to perform much better than thresholds involving deterministic upper 
bounds on the variance of the coefficients in finite samples. The inequalities of the Theorem 8 show that they 
work at least as well as a deterministic one using the unknown variance of each coefficient. 

4.4 Adaptive estimation over Besov ellipsoids 

The general inequalities of the previous section can be used to derive minimax results. We condider here some 
Besov ellipsoids and obtain 



l/z 



In the 



1 

fx 



1/2 



Theorem 9 Take J such that 2-^('^+('^-i)/2) 
have 

(i) For any z > 1, there exists a constant Cqo = 600(5,^,7) such that if ^ > z/2 + 1, 



<t-^. If M > 0, r > 1, s > {d - l)/r and q > 1 we 



sup E 



where 



'---I,a,p 

fl3 - fl3 



s-{d-l)/r 



1 



f- 



X 



/J-s ,00 ^ 



t. 



L°°(_H'+) 



(27) 



s + v- (d- l)(l/r- 1/2)' 

{a) For p G [1,00), q > I (with the restriction q<riss=p{i> + [j. — ^^J, there exists some constant 
Cp = Cp{s,r,p,^) such that if ^ > p/2, 



sup E 

where fj, = fid with 

and w = vjd = r in the dense zone 

and n = Us with 



~I,a,p 



fJ-d 



S > p ll' + 



< epilog n)P-^M^ 
s 

s + u + {d-l)/2 
d-l\ /l 1 



fx 



HP 



L°°(H+) 



(28) 



Ms 



2 / \r p 

s-{d-l){l/r-l/p) 
s + i^- (d- l)(l/r - 1/2) 
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The constant ^s,oo corresponds to the hmit of as p goes to infinity. It should be noted that these upper 
bounds blow-up when fx is unbounded from below. We will see in the next section that trimming allows to 
avoid this problem, at the expense of a more complicated control of the expected loss. 

5 The case where the density of the design is unknown and possibly un- 
bounded from below 

In this section, we consider a modified estimator to handle the case where the density of the design is unknown 
and possibly unbounded from below. We show a modified version of Theorem 9 in that case. 

5.1 Plug-in strategy 

We assume now that one has a preliminary estimator fx of fx, based on a different sample. Expectations 
are taken conditional on that first sample. The estimator can be trimmed by a proper constant t to allow for 
designs with density approaching zero. This is particularly useful in the neighborhood of the boundary of , 
in order to avoid too stringent assumptions on the distribution of the design. 

Using a simple plug-in rule, fx can be replaced in the previous estimators by fx yielding the estimated 
harmonic projection of the extended regression function 



and w = Wg is arbitrary such that w > p 



u+(d-i)(i/2-i/{pyz)) 

s+u-{d~l){l/r~l/2) 



in the sparse zone 





of expectation 




where (^fx/fx^ is the even extension to the whole sphere of fx/ fx (initially defined on H^). This gr 
to the following linear estimator 



;ives rise 




whose mean is 
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The plug-in estimators of the needlet coefficients are 



21- 



k odd 



P,a,J 



{fa , Vi < J 



which yields the thresholded estimator 



P,a,p 



In this section we consider the data driven thresholds 



n — 1 



where 



n 1 



n(n — 1) 



1=2 k=l 



Y — b 



fx{X,) 



k odd 



k4 



(29) 
(30) 



and is some upper bound on the sup-norm over of {±1} x H'^ of Gj'^{x, y) — IE Gj'^(Xj, Yi) = G^^{x, y) — 
d^fi where the expectation is conditional on the sample used to estimate fx, and B^f the expectation of dff-, 
again conditional on the sample used to estimate fx-, 



aP,a 

f = -OJ 



1 b 

o->e) E 



k 



k odd -^^.'^ 



The following uniform upper bound could be used 



iiu+{d-l)/2) 



fx 



^ Mr. 



(31) 



L°°(//+) 



5.2 Upper bounds 

Below we denote, for vr > 1, by 



fx 



TX 
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where Cr^-w = 2Cp'Cproj-B((i,p)|S'^~^|*^^/'''~^/'^^+ and Cproj is the constant of the continuity of the smoothed 
projections (wee Lemma 2.4 (c) of [2(3]). The expectations in the theorem below are conditional on the sample 
that is used to estimate fx- 



fx 



Theorem 10 Take J such that 2-^('^+('^-i)/2) 
have 

{i) For any z > 1, there exists a constant Coo = Coo(s,r) such that 7 > z/2 + 1, 



< t-^. If M > 0, r > 1, s > (d- l)/r and q > 1 we 



sup E 



fl3 - fl3 



z~l ( T\,r P,a,J,r,TT 



<3"-Mnf <^Coo(logn)^-MM 

7r> 1 





1 




( 


fx 


tn] 







where 



s-{d-l)/r 



(32) 



s + u-{d-l){l/r- 1/2)' 

(a) For p G [1,00), q > 1 (with the restriction q<riss = p{i> + [j. — there exists some constant 
Cp = Cp{s,r,p) such that if j > p/2, 



sup E 



~P,a,p 



fp ' ' - fl3 



< 3P~^ inf |cp(logn)P-l (^^^P,a,J,r,ny 





1 




r( 











tip 



(33) 



where fj, = fid with 



and w = r in the dense zone 



s > p ll' + 



s + u + {d-l)/2 
d-l\ (\ 1 



r p 



and fi = fig with 



Ms 



s-(d-l)(l/r-l/p) 



s + u-{d-l){l/r - 1/2) 
and w is arbitrary such that w > p '^^^'^Z^d-i){i}r-i/^) sparse zone 



d-1 



< s < p [ u + 



d-l\ /I 1 



r p 
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Two quantities appear in the upper bound that account for the design and the estimation of the density of 



the design: 



1 

fx 



L°°(J/+) 



and /'m^''*'"^'^''^) . Since in most design distributions of interest in the original scale 



^, the corresponding density on the sphere fx is bounded from below, it is useful to work with estimators 
fx which are trimmed estimators of an original estimator fx = max(/x,t) for a properly chosen t. For such 

trimmed preliminary estimators we obtain 



fx 



L°°(//+) 



Now, the quantity 



fx _ 1 

- — -t 
fx 



fx 



f. 



appears in the term M^'"''^'^''^. It is possible to use the upper bound 
<t-' Tx-fx 



1 



X 



For a trimmed estimator, this yields, for example. 



X 



TX 



fxl 



fx<t 



< (l+t-^fxho 



1/tt 



+ t-' 



fx - fx 



a{0<fx<t)' +t-^ fx -fx 



Moreover, for u > 1, 



(r{0<fx<t)<a{0<fx<ut)+a(^fx-fx>{n-l)t 



Note that on the one hand vr = 1 is good for o" (0 < /x < i) to be as small as possible, but a multiplicative 

factor 2"''^'^"^)'^^"^/^) is paid. On the other hand choosing tt = p implies a multiplicative factor equal to 1. Thus, 
based on the upper bound, the best choice for vr and t depends on the smoothness of fx and the sample size of 
the first sample, as well as the function u i— t- a{0 < fx < u). 



6 Proof of Theorem 7 

Let us prove two lower bounds. They yield the lower bounds in the dense and sparse zone. We conclude by 
checking for which value of the parameters one rate is larger than the other one. 



6.1 Proof of the lower bound in the dense zone 



Consider a set of measures {Pm)m=o indexed by a finite family of densities (/m)m=o which are the distributions 
of an n i.i.d. sample of (Y,X) when = fm and for a given fx- The tower property of the conditional 
expectation yield that the Kullback-Leibler divergence between two measures and Pq is given by 



K{Pm,Po) =nEf^ 



«,;„KX,lo.(™).,l 



^(/™)(X))log 



l-n{fra){X) 

i-n{fo){x) 
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It is easy to check that 



fx 



nE 



fx 



n{fo){x) i-n{fo){x) 
nifm-fo){xf 



n{fo){x)-n{fm){x)\ 



n{h)(x)(i-nh)(x)) 

The general reduction scheme together with the Corollary 2.6 of the Fano lemma from [28] yield: 
Lemma 11 //, for a G (0, 1), some positive integer A4 
(i) fm(^B^,^g{M) form = l,...,M, 



(a) Vm 7^ /, 



//lip >2h>0, 



m 



M 



then 



Vz > 1, inf sup E 

fa feeB-r.qiM) 



fl3 - fp 



logjM + 1) - log 2 
log 



a 



We take the indices m that correspond to vectors of and 1 of size |^j|. We consider the family 

1 



fn 



— + 7 ^ m^^Pj 



where 7 is small enough to guarantee the positivity of fm for all m and the fact that for one of them, /o, 
corresponding to a vector mo, Vx G H^, ?^(/q~)(x) < Cf, for some Cb G (0, ^). Because Hifo) = ^ + 'H{fQ), 
the last condition implies that 'H{fo){x) (1 — T-L{fo){x)) > — Cf,)^ > 0. This yields a family Aj of functions of 
cardinality 2^'^^'^^^''' ^'J. The Varshamov-Guilbert bound (see, e.g., [28]) yields that there exists a subset A'j C Aj 

such that V(mi,m2) G (^{0,1}"^^^ , X^geA' l"^i,C ""^2,^1 > ^2-'^'^"-'^^ We denote the corresponding family of 

fm = tJztt + 7 I] rn^i^j,( 



functions 



by A'j, it is of cardinaHty M > 2^"^^'^'' '^J/^. When p = 00, we work with the whole family Aj. 

I7I < 2-j(^+('^-i)/2)m implies that V/™ G Aj, fr,^ G Bf.g{M). Take I7I > 2-j('*-('^-i)/2) as weh. Indeed, when 

r < 00, 



|^|2.(«+{^-l){l/2-l/r)) (^^) 



< |7|2^ 



{s+(d-l)/2) 



< M. 
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It is straightforward to check that the same condition is also sufficient when r = oo. 
Lemma 1 (ii) yields that for p G [1, oo), nii and m2 in {0, 1}"^^ , 

II f - f II > UU" oj(d-l)(l/2-l/p) ( ^2-''('^"^) 
llJmi ^m.2llp ^ I /l'-p,A^ I " ^ 

while for p = oo, mi and m2 in {0, 1}"^^', 



i/p 



It 



CA 



i/p 



M2~^' ^ 2h 



i/p 



For mo and every m in {0, 1}^^ , we get 

•1 ^ -2 



i^(Pm,Po)< \2-^bj ll/x||oon||7^(/„.-/o)||^ 

'^b] ||/x||oo"'2"2^-'"2)'' Wfm - /0II2 (from (10), writing the squared L^-nor 



m 



as the sum of the squared L -norm on the spaces H ' for = 2-' + 1, . . . , 2-' — 1) 



< {c'i) 

< (C2) 



Cb 



Cb 



fx||Lo.(H+)n2-2(^--2)V 



(m^ - mo,^) 



C6A', 



-2 



/x||l°°(h 



+)n2-2(i-2).^2 



(m^ - mo,^)^g^, 



£2 



£1 



(Lemma 1 (i)) 



Cb] 2^ni/x||L^(H.)Cin2^-('^-i-2^)7^ 
2'l|/x||L-(H+)Cin2-2i(^+-) 



In the p = 00 case we replace A^- by Aj above. Cs is an upper bound and we can replace by the constants for 
Aj and A'j. Condition (iii) of Lemma 11 is satisfied once 



n\\fx\\L-iH^)2-''^'^-^-^^'-'^/'^ < 



acyi(log2) (l- Cb 



24^^+3(71 



The larger h or hoo above is obtained for the smaller j in the above condition thus 2^ ~ (f^H/xHoo)^^^''**^''^^'^ 
Lemma 11 now yields for every p £ [1, 00] and z > 1, 







fl3 - fp 


->( 

P \ 



1 



s + v + (d-l)/2 
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(0 
(«) 
(Hi) 



6.2 Proof of the lower bound in the sparse zone 

Consider now the hypotheses 

where ^ belongs to Aj and I7I < 2~-'('^~^)/^ to ensure the functions are positive. The constant is adjusted so 
that for one of the that we denote /o, Vx G , "^(/o")!^) ^ with G (0, \). 

We denote by the distributions of an n i.i.d. sample of (Y,X) when = and for a given fx- Here M is 
the cardinality of Aj thus ~ 2^^'^~^\ Like in [29], we make use of the following lemma from [22]. We denote 
by A(P^,Po) the likelihood ratio. Recah that k\p^,Pq) = Ep^ [A(Pg,Po)]- 

Lemma 12 //, for ttq > and some positive integer M 

fm&Bf.^{M) form = l,...,M, 
'im^l, Wfm- fl\\p>1h>{), 

\/m = 1,...,A^, A^fo, fm) = exp(z™ — v^), where z,™ are random variables and fj: 
F(z™ > 0) > vro and exp (sup^=i^,„^^ v™) < M, 

then 



constants such that 



Vz > 1, inf sup E 



fp - fl3 



> 



h %o 



(i) is satisfied when I7I < M2--'(^-(°'-i)(i/'^-i/2). This is more restrictive than the condition to ensure positivity 
because we assume that s > {d — l)/r. Thus, now we take I7I < 2~^^^~^'^^^^^^/'^~^/'^\ h in (ii) is obtained as 
follows, if ^ and ^' belong to Ai, 

> |7|<^2^'('^-1)(V2-1/P) 

> 2-i(^-(d-l)(l/r-l/p))_ 



Fp^ (log (A(Po, PO) > -3{d - 1) log 2) > 1 - Pp^ (|log (A(Po, Pd)\ > Kd - 1) log 2) 

EpJ|log(A(Po,Pg))|] 
j{d -l)log2 

Thus, condition (iii) is satisfied when 

Ep^ [|log(A(Po,i^€))|] < ai(d- 1) log 2 
for a £ (0,1). The same computations as in the beginning of Section 4.1 yield that we need to impose 

||/x||Lo.(^.)n2-2^-(^+-('^-i)(V-i/2))<^-. 
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We can check that it is possible to take 



2^ 



"'ll/x||L°°(ff+) 



2(s + ,y-(<i-l)(l/r-l/2)) 



,log {n\\f_ 



x||l°°(_h'+) 



which yields the desired rate. 



7 Proof of Theorem 8 

7.1 A preliminary decomposition 

We know from [8] that for all p in [1 , cxo] 



//3 ' - //3 



< 2 



We also use that for z G [1, oo), 



- I,a,p 



fa fa 



< 2 



+ 



B ~ JB 



(34) 



The second term is the approximation error. Let us focus on the first term which corresponds to the error in 
the high dimensional space. 
Lemma 1 (i) yields 



JB ~ JB 



<(j+l)-i^ 
i=o 



E 



PT. 



< ( J + 1)^-1 ^ (jnz2j{d~l)z{l/2^l/p) 
3=0 



When p = oo, we thus have 



JB JB 



< ( J + ly-^ c^"2^'('^-^)"/2 sup 

j=0 



while for p < oo, we obtain 



/3 



3=0 5eHj 
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the last inequality is obtained using the fact that when p'> z, 



while by the Holder inequality when p < z, 



z/p 



Note that for the case p < oo the inequality is sharp if and only z = p. 
7.2 CoefRcientwise analysis 

For the simplicity of the notations we will sometimes drop the dependence on 7 in the sets of indices. 
We first focus on 



PT, 



By construction, 



PI 



3,i 



max 



^\Pl:t\<T,,^,, + 



/3? 



3,i 



We introduce two "phantom" random thresholds T^^^ = Tj^^^^ — Aj^^^^ and T?^^ = ^j,g,7 + foi' some 

Aj^^^^ to be defined later. They are used to define "big" and "small" original needlet coefficients. We will also 
use Tp^^ and A^c ^ that are respectively, with high probability, deterministic lower bound, upper bound 

and upper bound of the previous quantities. This yields 



^hi,z = max ( 131^ 



< max 



< max 



max ( lie-/ al^y lUa U-ps ,l|^/a|^^ lUa Kjis 



max li-?/ 



I<« I >^.«.7 " I <n«.7 ' ^ I I >^.e.7 >n«.7 



max I ll „t, |<7is ;l|fl^,a_£{a K A , , 



max 1 l|c-/,a_oa Ka., '■'■Ifl" I^T'' 



max [ lioa |<T''''+ ' ^T" ' ■'■1^?^'° . , 
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'hi 



'hi 



max 



J\5 I 



\>T' 



and sorting them according to the number of random terms 



5j^^^z ^ max 



PI 



j,i 



J. €.7 



hi 



■^\I3''A>t''-- ' 



max 



--/3^,|>A,.,.,'X^7>n.,7. 



7.3 Scalewise analysis 

Defining Mj^^ as 

and Sj^z as 



Mj 2 = sup 



-Phi 



ieE, 



{(^'hi)-f^hi 



= sup (5j,5,2 

iGEj 



i&Ej 



we obtain 

Mj^2 < max 



sup 



a ^ 



sup 

i&Ej 



f3 j £ max I \rp3,+ ^j^s 
^1 - 



■'■1/3° J <T' 

ieEi 



^ max(Mfo,M^l,Mf,i,Mf,2) < Mj^! 

+ X 1/3? J max (lrpS,+ rps 



max I l^b,- ,l|-^i,a Ka . , 



rj,?l J, 5, 7 , — 



We can 



i&, 
+ 

bound the expectations of each term as fohows 

z 

— J. €.7 



'i-rps,+ ^rps jll-^/.a on 
i.f.^^ i.f.7 Pj,^ 



>^J,?.7, 



max I 1^6,- ^6 ,l|-^7,a |>A t 



E 




= sup 


Phi 








E 




< sup 


Phi 






i&j 





'E 



sup max ( lrpS,+ rps , 

5eHj V j.i.i^j.i.j \Pj,i 
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< sup 



E 




< E 


sup 




z 













Using the Holder inequality with r > 1 to be specified later 



E 



_B2 



J, 2 



< E 



< E 



sup 



sup 



E 



sup max 1^6,- ,l|-?7,a I A 



-| 1-1/t 



nl/r 



1 



E 



E 



S. 



fa <'7^'^'~'~ 



S. 



SI 



E 



max 1 



E 



S: 



Bl 



= E 

?GH, 
?GH, 

= EiE[|^|-/3^e 

< EE[|^,f-/5j:e 



1 






1 







E 



S: 



B2 



max I Xrpb,- yrph jll^^.n fla 



J. €.7 i,?,7 \' ^j,€ 



using the Holder inequality with r > 1 to be specified later 

The following concentration inequalities allow to control the stochastic terms appearing in those bounds. 

7.4 Concentration inequalities 
7.4.1 Bernstein inequality and the /3- V — 0""^ 



terms 



We denote by 



(aj,,)' = E[(G},,(X„y.)-/3^, 



the variance of Gj ^{Xi,Yi), if ^ > 0. 
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Lemma 13 For any c^ and cm positive 



E 







{ ( 


)1 


< 2 


C2,z 









Ca + CM—T^ I 



Proof. The Bernstein inequality yields 
setting u = T{ccrCrj^^ + cmM-^^ yields 



3n 



,1 ,2 



< 2 



4iwT~ 

J, 5 



< 2 



4" I Cct+Cm-^ 



ii ^2 



+ e V j.« / 



We use now 



and the upper bounds (25) and (26) to derive 



E 



c^al^ + CMMl. 



< / ZT 



c^al^ + cmMI^ 

( 



> t\ dr 



+ e 



dr 



this yields (35) □ 

Taking Co- = 1 and cm = yields 



E 



< 2 2c 



■2,z 



n 



(35) 



(36) 
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while taking Ccr = c^i/log n/n and cm = c!m ^ogn/{n — 1) we obtain 
/ 



E 



M' 



\ c^Vlogn^ + 4/ log 







f ( 


)] 


< 2 


C2,z 









J, 5 



V 



3./ 



c'aVnV^ognjff + log 

i,5 



< 2 C2,, 2^-_ 



3 4/ log n 



(37) 



The following lemma is useful for the p = oo case. 
Lemma 14 For any E'j C H^, 



E 



sup , , , 



< 



2^/2 



V 



I 



2 + (log (C2,. 



z/2 



Proof. A uniform union bound yields 



sup J j- 



> r > < min 



3n . r '^if I 



( 



2 + (log(ci,,|H;.|))') (38) 



V 
/ 



< min 



1, 



+ e 



V 



2e V ' 



l) 



V 



+ min I 1, 

We thus derive, for any ri and T2 positive, 



4" I c^infjgH' T7r-+cj\/ K 

2e V ^ " .« 



E 



sup J J 



< I ZT^ ^ min 



2e 
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f 




/ ZT^ ^ mill 


1, 






V 





2e 



.1 



dr 



ZT 



z-1 



T>T2 



2e 



M ^2 



2-1 



r>ri 



2e 



n Ca inf. 



dT. 



Let 



log ( 



Cl„ 



3n 



and 



T2 



2^2 t^'^K"' 



Ca infggs/ + CM 



J,? 



+ CM infggs' — ^ 



by construction 



Vr > Ti, 
Vr > r2, 



2e 



2e 



2 8 

< e 

Cl,2 



< e 

C2,2 



M ^2 



This implies 



E 



sup , 



< 



2^2 



log (C2,; 



\ I 

+ 2 



2^2 



+ 



8 loglci,^ 



V 



3n 



c^ inf^gS', + CM 



V 



+ 2 



C(7 + CM inf^eH'. / 



3n 



inf^gH' ]gr- + CM 



which allows to conclude □ 

If Co- = 1 and Cm = (38) reduces to 



E 



sup 



< 



'2V2' 



2+ log C2 



2/2 



8 

— sup 



2+ log ci 
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Note that one could have used the uniform bounds (see (24)) and 



ali.<C2B{d,2) 



IX 



instead of Mj ^ and a^^ and obtain 



E 



sup 



< I "^a'] (2+(log(c2,. 



n 



1/2 

z/2 



2^" = al. 



2+ log ci, 



Along the same lines, with Ca = c'^^tn and cm = c'jv/ log n/(n — 1), we obtain 



E 



sup 



M' 



\ c'^Vlogn^ + c'^j log 



< 



Ml 



c^Vlogn + 4^ log n/ Vn^inf^gs; - 



2+ log C2 



z/2 



8/3 



< 



d^yjn logninf^g 
2^2 ^ " 



+ c'm log n 



recall that when H' 



•j - -j^ 



2+ log C2 



< C=2J('^-^). 



z/2 



+ 



2+ log ci 



3c'^/logn 



2+ log ci,^ 



(39) 



(40) 



(41) 



7.4.2 Empirical Bernstein and the probabilities 

We define 

A 



14 , 7logn 



2A 



t: 



3A 



■j>€.7 



J. 5,7 



A". 

J.5.7 



r 26 , ^ r 7 log n 
— r 2 , ^ r 7 log n 



^^7 = ^i:«,7 t;:^ = 3a+^^. 
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Lemma 15 The following upper bounds hold 



< 



iid-i). 



Proof. Using the results of [25] we get 



Ml ■ 



> ^2^^ + — M/ 



< 3e-" 



3 " ^'^n - 1 

which yield the first inequalities. The second set of inequalities are obtained using a union bound 



□ 



7.5 The p = 00 case 

7.5.1 Error in the high dimensional space 



E [M,- ,] < E 



+ E 



SI 



J, 2 



+ E 



J, 2: 



+ E 



with 



E 
E 



M. 



so 



sup 



/3? 



I J. €,7 



M, 



51 



< C^2^('i~^)— sup 



/5? 
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E 



E 



M. 



Bl 



< E 



sup 



M. 



B2 



4 y-i/^ ( (2^ / 



n7 



where we have used (a + 6)^/'^ < aV^ + ftVr for r > 1. 
This yields 



E 



J 13 J 13 



ij + iy-^c'^ 



j=0 



3° I <T'''+ ~^ ^ 
J,? I— i.€.7 



sup 



Ls" J>T'''r 



+ i_CH^2^-('^-i)(^/2+i)sup /3^5 



j=0 



/I X l-l/r J 



+ (-M/) (2V- + (log(|H,|ci,,,))^ 



7.5.2 The i?; ^ , and i^^, _ , terms 



'2, 00, 2 



The R'l 00 2 is exactly the term appearing in Theorem 8 and thus we only need to bound -R2 00 z- 

As in the p < 00 case, one can plug the uniform bounds on ctJ^ and Mj ^ as well as the bounds < 
to obtain 



4 X 1-1/r J 



i=o 



'2V2 



n 



C2B{d,2)2^^ 



1 



1/2 N 



f^CooS(d,oo)2^('^+('^-i)/2) 



2V- + (log(c2,..|Hj|))^/2^ 

(2i/- + (log(ci,,,|Hj|))^) 



1 



4 

< (Ch— ) 



'2V2 



77, 



C2S(d,2) 



fx 

1/2 

L°°(//+) 



L°°{_H'+)^ 



2^- + (log C2,..))^/') 5] 2ii-^Hd-l){z/2+l-l/r)) 

j=0 
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_8_ 

3re 



CooB{d,oo) 



1 




fx 


L°°(H+)/ 



) (2V- + (log(|H,|ci,,.)r)E2^( 

/ 7=0 



< ( C=— ^ 



'2^2 



C25(d, 2) 



2i/- + (log(c2,..|Hj|)) 



+ I — CooS(d,Oo) 



7.5.3 The O^^ term 

Denote by 

oi 



1 




fx 


L°°(H+)/ 



1/2 ' 
L-(/f+), 

2i/- + (log(ci,,.|Hj|)) 



uz+id~l){z+l~-l/T)) 

2J(t/^+((i~l)(2/2+l-l/T)) 



z/2 



1 _ 2-(!^2+(d-l)(^/2+l-l/r)) 
2J(!^2+(rf-l){2+l-l/r)) 



1 _ 2-(''^Hd-l){z+l-l/T)) 



z,j - sup 



Because r;;^++ > r;_^+ , we get 



E 



sup 



E 



/3" J<r'''7 

I J. €.7 



+ E 



sup 



sup 



rj,?l J, €,7 



■E 



sup 



< E 



sup 



J. €.7 



+ E 



sup 



J. 5. 7 — r J.5 1 j:?:7 



sup 



/3? 



thus 



0',j < I 1 + E 



sup 



sup 



+ E 



sup 



r'''++> J>T'''" 

J. 5. 7 — I 3,4,7 



3", >Tf'7+ 
j,€l J, 5, 7 



Using now (41) with = ^/2^ and c'^^ = |7, we get as < C=2-'^'^ gives the upper bound in Theorem 8. 
Remark that using (41) with E'j = 



Remark that using (41) with H'- = Ej is rough since the sup could be taken on the much smaller subset 



7.6 The p < oo case 

7.6.1 Error in the high dimensional space 

We have 



E 



S: 



so 



+ E 



S: 



SI 



+ E 



S. 



Bl 



+ E 



S: 



B2 
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with 



E 
E 
E 

E 



qSI 



qB2 



< 5:e[|^--/3^ 

41-1/^ 



/3" J<r'''7 



< 



■'■1/3'^ J >t'''" 
/ / \ ^ 



4 l/{zr)Mli:y^ 



n 



where we have used (a + 6)^/"^ < (a^/^ + 6^/^) . This yields 



E 



//3 ~ //3 



j=0 

A J 



1|/3%|<T%+ +^ 



'^\b'^A>t''' 



j=0 



+ 



22-1/t ^ 



7(l-l/r) 



n 



4^1/(.r)^i,? 



n 



7.6.2 The -Ri,p,2 and i?2,p,2 terms 

The Ri,p,z term appears as is in Theorem 8. 

To bound the term i?2,p,z) we rely on the uniform bounds Mj in (24) and ct| in (39). We obtain 



< 2^^" l2cl(^'/^C2Bid,2)2^'' 



1 


1/2 1 \ 







+ E2'/^ (U!t^Co.B{d,oo)2^(^' 



l)/2) 


1 






fx 


L°°{_H'+) / 



RR n° 7647 



Adaptive estimation in the nonparametric random coefficients binary choice model by needlet thresholding 38 





1 


r 


fx 



+ Ch2V^ (\c\fj^C^B{d,oo) 



z/2 

L°°(/i'+) 



1 



2j{{d-l)+zv) 



TX 



1 



2jiid~l)+z{u+id^l)/2)) 



thus 



+ 



^2J'(^-i)^(V2-i/(pvz)) 



4 



3=0 





1 


y 


fx 



z/2 

L°°(H+) 
1 



2i((rf-i)+2;i') 



^2i(('^-i)+^('^+('^-i)/2)) 



L°°(_ff+) 



<Ch2V- (2c^/ir^C2S((i,2); 





1 




fx 



< 



Ch21/- (24/jf)c2i3((i,2) 



z/2 

1 



J 



1 _ 2-^(!-+(rf-l)A+{'i-l)(l/2-l/(pV^))) 

C^=2V- (|c;/ir)Cooi?(d,oo 



fx 

1 



^ \^ ni2(l/+(d-l)A+{rf-l){l/2-l/(pV^))) 

j=0 



x 



z/2 



n^/2 



2Jz{u+{d-l)/z+{d-l){l/2-l/{pyz))) 



1 _ 2-<'^+{d~l)/z+id~m-l/{pVz)) 



1 



TX 



7.6.3 The Op ^ term 



Denote by 



O 



'hi 



/3° j<t;''+ 



+ E 



3", >T'''r 

J.tl J. 5. 7 



Because ^j'^^'t^"^ > T*'^"^^, we get 



E 



LS" J>T°'" 



E 



< E 



I oa KtS. + + + IE 



3" , >T 



j.t.7 — r j,4 1 J. €.7 



j.€l J.?. 7 



E 


^1 - 


z- 






z 



j,4,7 — r j,€ I J,?, 7 
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E 


^1 - 


z- 






z 



PI 



■E 



1 I aa I ^0^-5,4-+ • 



Now using the results of Section 7.4.1 with T,', = yflqtnCrL + W^^L 

1 



sup ■ 



E 




z- 




z 



< SUp2 C2,z 2- 



4 1 

' 3 2/37 log n 



< 2 



.1/^ 

'2,2 



2c 



+ 



1/2 

1,2 



n 



7 log n 



thus 



Op,, < (1 + 2 



V2< 



.1/2 
-2,2 



-y/7 log n 



+ 



3", <t: , 

J,t.\— J, 4,7 



7logn 

.,++ + E 



j=0 



1/3° J>T'''+^' 



8 Proof of Theorem 9 

The proof of this result requires to upper bound the approximation error, the Ri,p,z and Op^z terms in the 
upper bound of Theorem 8 when z = p using the prior knowledge that the unknown belongs to the ellipsoid 

8.1 The p e [1, oo) case 

8.1.1 The approximation error 



j,—I,a,J „_ 
JB ~ J 8 



E E Pl&M 



From Lemma 1 (i) and the definition of the Besov spaces as a sequence space, 



•)Js 



E E Pl^^M 



j>J ' ^^^"^ 



< 



B- 



which yields that 



E E 



< 



MG 



l/p-l/r 



g ~ [ M2^'^('*^('^~^)(^/''~^/p)) if r < 



if r > p 
p 



RR n° 7647 



Adaptive estimation in the nonparametric random coefficients binary choice model by needlet thresholding 40 



It is enough to consider the worst case where r < p and to check that ^ ^'t+(d-i)/2^^^ ^ in the two zones. 
On the first zone s > (i/ + (£ - 1) thus s + u + ^ > (u + 2 ^hich yields < ^^^Li^p ^ 

Because s > {d - l)/r and p > r, s - ^ + ^ - f = {d - 1) ~ ^ (r " j) ^ 0' ^^^ich yields s - {d - 
l)(l/r — 1/p) > y and gives the result. 

On the second zone, it is straightforward, because s > {d — l)/r, that '^"^^^^^-^ly-i^^^ — s+Z^(d-i){i/r'-^i/2) • 



8.1.2 The Ri,p,p and R2,p,p terms 
Using Lemma 2 (iii) we obtain that 



j=Q 



where the exponent is non positive because s > {d — l)/r, thus 

4 



1 



Rl,p,p < ^^{J + lT ^MPC'^PC^ ^ _ 2_p(s+{d-l){l/p-l/(pAr))) • 

With 7 > p/2, Ri,p,p is of lower order than t^. 
We also have 



With the aforementioned choice of J, ^2"'(''+('^ ^^/^^ 



1 



1/2 



2J(d-i) 



1 

.fx 



< 1 and 

L°°(H+) ~ " 

decays to 0). Together these yield that bn.p^p^j^r is of the order of a constant. 

This term is also of lower order than t^ for r large enough such that 7(1 — 1/r) > p/2. 



L°°(J?+) 



< 1 (it even 



8.1.3 The Opp term 

First note that an^p^pj = 1 + o(l). 
We take Tj'^^ uniform in ^: 



Tl'++ = 3y/2^tnC2B{d,2)2^' 



_A rpS,+ 
~ 3,1 



1 



JX 



1/2 
L°°(_H'+) 



+ 52CooB(d,oo)2J 



1 


1/2 / 


fx 


L°°(_ff+) V 


1 




/x 


L°°(_ff+) ^ 



n — 1 



TX 



7logn 
n — 1 



(because of the upper bound on J) 



3V2C2B{d, 2) + 104Coo-B(d, 00)) (for n > 2) 
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as well as the following consequence of (36) 



E 



< 2 2c 



2,p 



< 2\2cyiC2B{d,2)2^' 



1 


1/2 1 \ 


fx 


L°°(H+) \/"/ 



l)/2) 


1 






fx 





< 2^P^- 



< 



< 



T. 



hi 



+ 2(^^cl/;CooS(ci,oo)2^(^+(^ 

p/2 

L°°(/i'+) 

c^/^^C2g(^, 2) + |cl/;Cooi3((i, oo) \ 



X 



2^+' (cyiC2B{d,2) + -cJJ'Cooi?(d,oo; 



r2 2 



(7 log n)p/2 1 3V2C2B{d, 2) + 104Coo5(d, oo) (^) I 



t: 



V2 



i/p 

l/p J^iJL 

Co „ -|- 



(7logn)P/2 1 3 2.P ^ 



Let 



= 3V2C2B{d,2) + 104CooB{d, oo)^ 



= 2Vp I 



'2,P 



78^7 



Now, for any < z < p, 



+ E 



1/3° J>T^*'++ 
rj.€l J. 7 



< ( 1 + 



CP 

"■jP 



< 1 + 



(7 log n)P/2 
CP 

(7 log ?i)P/2 



1/2 



X 



L°°(H+) 



We then need to sum over j and will take two different values for z, one that we denote zi for j < jo and one 
that we denote Z2 for jo < j < J- zi, Z2, jo will be specified later, depending on the value of the parameters 
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r, q, s and p such that we are in the dense or sparse zone. Up to a multiplying constant, we thus need to 
control 



A + B 



1 



Tx 



+ 



1 



1/2 

L°°(/i'+) 
1/2 



\P-^1 jo 

/ i=0 ?GH, 



fx 



P-Z2 J 



I,a 
'hi 



where we choose adequately zi, Z2 and jo in the two zones. Because of Lemma 2 (i) we only consider p > r. 
Let us first consider the dense zone. We define 



In the dense zone, f < r, p > f and 
With Z2 = r, we get 



s = + 



p{u + {d-l)/2) 
S + {d-l)/2 

d — 1\ ( p 



B < 



1 



X 



1/2 

L°°{H+) 



p~r J 



i=io+i 



Lemma 2 (iii) gives that 



Y |/3j5r''< i:)^:2-J'''(^+('^-i)(i/2-i/'")) 



where Vj G N, > 0, (-Dj)jeN S ^g. Note that 

1 1\ d-l 



s + {d-l)[--- 



thus 



5 < 



TX 



1/2 

L°°(/i'+) 
1 



2f 



p-r J 



/ j=jo+l 



1 /2 \ P"'^' 

^ tA 2^oKl"^)(^H-^) 

L°°{H+) / 



for g > 1 if r > f and for gr < r if r = f (i.e. s = p (^u + ^2^^ (^^ — ) . Taking 2-''' 2 ) 
we get 



(42) 



(43) 



1/2 

L°°(_ff+) 



5 < 





1 




fx 



1/2 

in 

L°°(/i'+) / 
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which is the rate that we expect in that zone. 

As for A, we take zi = r < f < r, this yields, using Lemma 2 (iii), 



A < 



1 



IX 



1/2 JO 
1 

1/2 



fx 

1 



fx 

1 



j=0 

1/2 jo 

£ 2-''[''(P-^)+("'-^)(P/2-l)-r(s+(d-l)(l/2-l/r))] 

/ i=o 

t„ g2^P(^+('^-i)/2)(i-^AT (using (42)) 
/ i=o 



/x 
1 



fj. 



L°°(H+) 
1/2 

L°°(H+) 
1/2 



p—r 



t 2^0P(''+('^-l)/2)(l-r/f) 



tn (from the definition of jo)- 



Let us now consider the sparse zone. We define by 

u + {d-l){l/2-l/p) 



r = p 



in a such a way that 



s + i/-(d-l)(l/r-l/2) 

s-(d-l)(l/r-l/p) 



p — r = p 



s + u - {d-l){l/r - 1/2) 



and 



Take zi = r. 



~_ _ {p-r){{d-l)/2 + u)-rs 
s + i/-(d-l)(l/r-l/2) 



A < 



< 



< 



1 



1 



1/2 io 

L°°(^^+) / i=o ces, ' ^'^ 



1 



1/2 
L°°(H+) 



P-'' JO 



g24(-+('^-i)/2-(rf-i)/f)|(^--)]z)J (using (44)) 



i=o 



/x 



1/2 

L°°(H+) / 



2Jo[(^+(<i-l)/2-(d-l)/p)|(f-r)]^r^ 



(44) 
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the last inequality holds because z/ + (d — l)/2 — (d — > 0, indeed because we are in the sparse zone v + {d- 



l)/2 > s/(p/r-l) = sr/{p-r) > 2/{p-r) > {d-l)/p. Taking 2Jo(^+('^-^)(i/2-iM)^ 
yields the upper bound of the order of 

1/2 \P"^ 



1 

fx 



1/2 



1 



X 



t. 



L°°(_ff+) 



for A. 

For B we take Z2 = r > f > r, 



B < 



< 



< 



< 



1 


1/2 


fx 


L°°{H+) 


1 


1/2 


fx 


L°°(J7+) 


1 


1/2 


fx 


L°°(H+) 


1 


1/2 


fx 





/ i=jo+i eeSj 

i V ^ 2j{'^+('^-l)(V2-l/p))p{r-r)/r^r (^ging (44)) 

/ i=io+i 
\ p-f 

tn M^. 



8.2 The p = oo case 

We simply consider the case where r = q = oo and deduce the general case using Lemma 2 (ii). 



8.2.1 The approximation error 

As / belongs to B'^^^{M), 





<E 


E f^h^^,i 




oo J>'^ 





< MCoo E 2J'('^-^)/22-j(«+{'^-i)/2) 

< MC„o2--^^ 



From the choice of J and the fact that 



1 



L°°{_fi'+) 



> 1 (because supp fx = -f^"^) we get 



E E 



< t. 



s/{v+(d-l)/2) 



this term is negligible because 



> 



v+(d-l)/2 — su+(d-l)/2- 
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8.2.2 The oo,^ and i^a.oo,^ terms 

Using the definition of tlie Besov norm, we obtain tliat 

i=o 

tlius 

Witli 7 > z/2 + 1, wliicli is satisfied wlien 2(7 — 1)(1 — l/r) > z, -Ri^oo,^ is of lower order tlian t^. 

Due to tlie clioice of J the bracket term in the expression of R200Z Theorem 8 is less than 1, as well 

the second term in the expression of 00, z,J,r is of smaller order than the first term. The order of 00, z,j,t is 
(logn)^/2_ Tj^^g 

This term is also of lower order than t^ when r is such that 2(7 — 1)(1 — 1/r) > z. 



8.2.3 The O'^ -, term 



Note that here an^oo,z,j is of the order of a constant. We shall proceed like for the Op.p term in Section 8.1.3. 
Using (40) we obtain that up to another constant (previously of the order of 1 + o(l)), 



sup 



+ E 



sup 



< 



1 



fx 



1/2 ^ 



2^<'-'^ sup 



PI 



3,i 



for arbitrary z G [0, z\. We thus need to upper bound 
A + B=\ ^ tr. 



+ 



1 


1/2 


fx 


L°°(fl'+) 


1 


1/2 


fx 


L°°(H+) 



t. 



z-zi jo 

sr^ ^j[u{z-zi)+(d-l)z/2] oI,a 
j=0 

^ ^j[u(z-z,)+(d-l)z/2] ^/,a 
j=jo+l 



Z2 



for some well chosen < jo < J, zi and Z2. Because / belongs to ^{M), for all z > 1, 



sup 



I, a 



The result follows using this upper bound in A and B and computing A + B with zi = 0, Jq such that 

2,0 ^ ^-l/(^+^+(rf-l)/2) ^ ^_ 
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9 Proof of Theorem 10 



The proof consists in a slight modification of the proof of Theorem 9 using the decomposition 



f(3 ~ ffS 



< 3 



z-l 



+ 



i,—P,a,J J,— I, a, J 
JB ~ JB 



+ 



j.—I,a,J 

JB ~ JB 



and the two following lemmas. 
Lemma 16 

where a+ = max(a, 0) . 
Proof. 



TX 



TX 



(45) 



P,a,J J,— I, a, J, 



R 



I.aJ 



< B{d,p)2 



Jv 



R' 



(Proposition 4) 



rU^-1 



J- 



X 



h^{H+) 



Conclusion follows from the continuity of the smoothed projections (Lemma 2.4 (c)) and the Nikolski 
inequality (Proposition 2.5) of [26], the Holder inequality and since are odd □ 

The constant Cproj could be taken independent of p, it is enough to take the uniform upper bound on the L^ 
norm of the smoothed projection kernels with respect to one of its argument according to the Young inequality 
(see [L5]). 

The following lemma is used in the analysis to relate the smoothness of the true function with that of the 
function with a plugged-in preliminary estimator of the density of the design. 



.P.aJ 



rP.a, J,r,7r\ 



Lemma 17 If f^ G Bf.^g{M) then, for any vr > 1, ' ' G B^,j{M^ 
A maximal resolution J should be imposed to obtain an additive term of the order of a constant, it depends on 
the quality of the estimation of fx and its smallness at certain points through 



& — 1 
fx 



Proof. As long as j < J, (/^ - 
1 (iii), 

2j(s+(d-l)(l/2-l/r)) _ i^P,a 



using Lemma 16. □ 



(/« ^'"^'"^ — fg tpj^^), thus we get, with J = j, using Lemma 



< C'" 
ei{{o,...,J}) - P 

< c 



f~P,a.,j _ f-I,a,j 
//3 //3 



fx 
fx 



({0,...,J}) 

2i(s+v+(d-l)(l/7r-l/r) + ) 



£9({0,...,J}) 
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